背景:心肌灌注SPECT(MPS)对左心室(LV)功能的评估依赖于准确的心肌分割。本文的目的是开发和验证一种新的方法,该方法将深度学习与形状先验结合在一起,以精确提取LV心肌以自动测量LV功能参数。方法:开发了与形状变形模块集成三维(3D)V-NET的分割体系结构。使用动态编程(DP)算法生成的形状先验,然后在模型训练期间限制并指导模型输出,以快速收敛和改善性能。分层的5倍交叉验证用于训练和验证我们的模型。结果:我们提出的方法的结果与地面真理的结果一致。我们提出的模型的骰子相似性系数(DSC)为0.9573(0.0244),0.9821(0.0137)和0.9903(0.0041),Hausdorff距离(HD)6.7529(2.7334)(2.7334)mm,7.2507(3.2507(3.1952)MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM MM,和7.6122 3.0134)MM分别提取心内膜,心肌和心外膜。结论:我们提出的方法在提取LV心肌轮廓和评估LV功能方面具有很高的精度。
translated by 谷歌翻译
最近,基于深度学习的超分辨率方法取得了良好的性能,但主要关注通过喂养许多样品来训练单个广义的深网络。但是直观地,每个图像都具有其表示,并且预计将获得自适应模型。对于此问题,我们通过利用图像或特征的全局上下文信息来提出一种新颖的图像特异性卷积核调制(IKM),以产生适当地调制卷积核的注意重量,这越优于Vanilla卷积和几个现有的注意机制在没有任何其他参数的情况下嵌入最先进的架构。特别是,为了优化我们在迷你批量培训中的IKM,我们引入了一种特定于图像的优化(ISO)算法,比传统的迷你批量SGD优化更有效。此外,我们调查IKM对最先进的架构的影响,并利用一个带有U风格的残差学习和沙漏密集的块学习的新骨干,术语U-HOLGLASS密集网络(U-HDN),这是一个理论上和实验,最大限度地提高IKM的效力。单图像超分辨率的广泛实验表明,该方法实现了优异的现有方法性能。代码可在github.com/yuanfeihuang/ikm获得。
translated by 谷歌翻译
极度依赖于从划痕的模型的降级或优化的降解或优化的迭代估计,现有的盲超分辨率(SR)方法通常是耗时和效率较低,因为退化的估计从盲初始化进行并且缺乏可解释降解前沿。为了解决它,本文提出了一种使用端到端网络的盲SR的过渡学习方法,没有任何额外的推断中的额外迭代,并探讨了未知降级的有效表示。首先,我们分析并证明降解的过渡性作为可解释的先前信息,以间接推断出未知的降解模型,包括广泛使用的添加剂和卷曲降解。然后,我们提出了一种新颖的过渡性学习方法,用于盲目超分辨率(TLSR),通过自适应地推断过渡转换功能来解决未知的降级而没有推断的任何迭代操作。具体地,端到端TLSR网络包括一定程度的过渡性(点)估计网络,同一性特征提取网络和过渡学习模块。对盲人SR任务的定量和定性评估表明,拟议的TLSR实现了优异的性能,并且对最先进的盲人SR方法的复杂性较少。该代码可在github.com/yuanfeihuang/tlsr获得。
translated by 谷歌翻译
精神分裂症是一种慢性神经精神疾病,会引起大脑内部的不同结构改变。我们假设将深度学习应用于结构性神经影像学数据集可以检测到与疾病相关的改变,并提高分类和诊断准确性。我们使用单一可用的,常规的T1加权MRI扫描测试了这一假设,我们使用标准后处理方法从中提取了3D全脑结构。然后在三个开放数据集上开发,优化和评估了一个深度学习模型,并对精神分裂症患者进行T1加权MRI扫描。我们提出的模型优于基准模型,该模型还使用3D CNN体系结构对结构MR图像进行了训练。我们的模型几乎能够完美地(ROC曲线下的区域= 0.987),将精神分裂症患者与看不见的结构MRI扫描中的健康对照区分开。区域分析将皮质下区域和心室局部作为最预测的大脑区域。皮层结构在人类的认知,情感和社会功能中起关键作用,这些区域的结构异常与精神分裂症有关。我们的发现证实了精神分裂症与皮质下大脑结构的广泛改变有关,皮层结构信息在诊断分类中提供了突出的特征。总之,这些结果进一步证明了深度学习的潜力,以改善精神分裂症的诊断,并从单个标准的T1加权脑MRI中确定其结构性神经影像学特征。
translated by 谷歌翻译
In this paper, we propose a robust 3D detector, named Cross Modal Transformer (CMT), for end-to-end 3D multi-modal detection. Without explicit view transformation, CMT takes the image and point clouds tokens as inputs and directly outputs accurate 3D bounding boxes. The spatial alignment of multi-modal tokens is performed implicitly, by encoding the 3D points into multi-modal features. The core design of CMT is quite simple while its performance is impressive. CMT obtains 73.0% NDS on nuScenes benchmark. Moreover, CMT has a strong robustness even if the LiDAR is missing. Code will be released at https://github.com/junjie18/CMT.
translated by 谷歌翻译
Dataset distillation has emerged as a prominent technique to improve data efficiency when training machine learning models. It encapsulates the knowledge from a large dataset into a smaller synthetic dataset. A model trained on this smaller distilled dataset can attain comparable performance to a model trained on the original training dataset. However, the existing dataset distillation techniques mainly aim at achieving the best trade-off between resource usage efficiency and model utility. The security risks stemming from them have not been explored. This study performs the first backdoor attack against the models trained on the data distilled by dataset distillation models in the image domain. Concretely, we inject triggers into the synthetic data during the distillation procedure rather than during the model training stage, where all previous attacks are performed. We propose two types of backdoor attacks, namely NAIVEATTACK and DOORPING. NAIVEATTACK simply adds triggers to the raw data at the initial distillation phase, while DOORPING iteratively updates the triggers during the entire distillation procedure. We conduct extensive evaluations on multiple datasets, architectures, and dataset distillation techniques. Empirical evaluation shows that NAIVEATTACK achieves decent attack success rate (ASR) scores in some cases, while DOORPING reaches higher ASR scores (close to 1.0) in all cases. Furthermore, we conduct a comprehensive ablation study to analyze the factors that may affect the attack performance. Finally, we evaluate multiple defense mechanisms against our backdoor attacks and show that our attacks can practically circumvent these defense mechanisms.
translated by 谷歌翻译
Automatic music generation with artificial intelligence typically requires a large amount of data which is hard to obtain for many less common genres and musical instruments. To tackle this issue, we present ongoing work and preliminary findings on the possibility for deep models to transfer knowledge from language to music, by finetuning large language models pre-trained on a massive text corpus on only hundreds of MIDI files of drum performances. We show that by doing so, one of the largest, state-of-the-art models (GPT3) is capable of generating reasonable drum grooves, while models that are not pre-trained (Transformer) shows no such ability beyond naive repetition. Evaluating generated music is a challenging task, more so is evaluating drum grooves with little precedence in literature. Hence, we propose a tailored structural evaluation method and analyze drum grooves produced by GPT3 compared to those played by human professionals, exposing the strengths and weaknesses of such generation by language-to-music transfer. Our findings suggest that language-to-music transfer learning with large language models is viable and promising.
translated by 谷歌翻译
Few Shot Instance Segmentation (FSIS) requires models to detect and segment novel classes with limited several support examples. In this work, we explore a simple yet unified solution for FSIS as well as its incremental variants, and introduce a new framework named Reference Twice (RefT) to fully explore the relationship between support/query features based on a Transformer-like framework. Our key insights are two folds: Firstly, with the aid of support masks, we can generate dynamic class centers more appropriately to re-weight query features. Secondly, we find that support object queries have already encoded key factors after base training. In this way, the query features can be enhanced twice from two aspects, i.e., feature-level and instance-level. In particular, we firstly design a mask-based dynamic weighting module to enhance support features and then propose to link object queries for better calibration via cross-attention. After the above steps, the novel classes can be improved significantly over our strong baseline. Additionally, our new framework can be easily extended to incremental FSIS with minor modification. When benchmarking results on the COCO dataset for FSIS, gFSIS, and iFSIS settings, our method achieves a competitive performance compared to existing approaches across different shots, e.g., we boost nAP by noticeable +8.2/+9.4 over the current state-of-the-art FSIS method for 10/30-shot. We further demonstrate the superiority of our approach on Few Shot Object Detection. Code and model will be available.
translated by 谷歌翻译
Graph Neural Networks (GNNs) have shown satisfying performance on various graph learning tasks. To achieve better fitting capability, most GNNs are with a large number of parameters, which makes these GNNs computationally expensive. Therefore, it is difficult to deploy them onto edge devices with scarce computational resources, e.g., mobile phones and wearable smart devices. Knowledge Distillation (KD) is a common solution to compress GNNs, where a light-weighted model (i.e., the student model) is encouraged to mimic the behavior of a computationally expensive GNN (i.e., the teacher GNN model). Nevertheless, most existing GNN-based KD methods lack fairness consideration. As a consequence, the student model usually inherits and even exaggerates the bias from the teacher GNN. To handle such a problem, we take initial steps towards fair knowledge distillation for GNNs. Specifically, we first formulate a novel problem of fair knowledge distillation for GNN-based teacher-student frameworks. Then we propose a principled framework named RELIANT to mitigate the bias exhibited by the student model. Notably, the design of RELIANT is decoupled from any specific teacher and student model structures, and thus can be easily adapted to various GNN-based KD frameworks. We perform extensive experiments on multiple real-world datasets, which corroborates that RELIANT achieves less biased GNN knowledge distillation while maintaining high prediction utility.
translated by 谷歌翻译
This paper focuses on designing efficient models with low parameters and FLOPs for dense predictions. Even though CNN-based lightweight methods have achieved stunning results after years of research, trading-off model accuracy and constrained resources still need further improvements. This work rethinks the essential unity of efficient Inverted Residual Block in MobileNetv2 and effective Transformer in ViT, inductively abstracting a general concept of Meta-Mobile Block, and we argue that the specific instantiation is very important to model performance though sharing the same framework. Motivated by this phenomenon, we deduce a simple yet efficient modern \textbf{I}nverted \textbf{R}esidual \textbf{M}obile \textbf{B}lock (iRMB) for mobile applications, which absorbs CNN-like efficiency to model short-distance dependency and Transformer-like dynamic modeling capability to learn long-distance interactions. Furthermore, we design a ResNet-like 4-phase \textbf{E}fficient \textbf{MO}del (EMO) based only on a series of iRMBs for dense applications. Massive experiments on ImageNet-1K, COCO2017, and ADE20K benchmarks demonstrate the superiority of our EMO over state-of-the-art methods, \eg, our EMO-1M/2M/5M achieve 71.5, 75.1, and 78.4 Top-1 that surpass \textbf{SoTA} CNN-/Transformer-based models, while trading-off the model accuracy and efficiency well.
translated by 谷歌翻译